Generalization of ERM in Stochastic Convex Optimization: The Dimension Strikes Back
نویسنده
چکیده
In stochastic convex optimization the goal is to minimize a convex function F (x) . = Ef∼D[f(x)] over a convex set K ⊂ R where D is some unknown distribution and each f(·) in the support of D is convex over K. The optimization is commonly based on i.i.d. samples f, f, . . . , f from D. A standard approach to such problems is empirical risk minimization (ERM) that optimizes FS(x) . = 1 n ∑ i≤n f (x). Here we consider the question of how many samples are necessary for ERM to succeed and the closely related question of uniform convergence of FS to F over K. We demonstrate that in the standard `p/`q setting of Lipschitz-bounded functions over a K of bounded radius, ERM requires sample size that scales linearly with the dimension d. This nearly matches standard upper bounds and improves on Ω(log d) dependence proved for `2/`2 setting in [SSSS09]. In stark contrast, these problems can be solved using dimension-independent number of samples for `2/`2 setting and log d dependence for `1/`∞ setting using other approaches. We also demonstrate that for a more general class of rangebounded (but not Lipschitz-bounded) stochastic convex programs an even stronger gap appears already in dimension 2.
منابع مشابه
Empirical Risk Minimization for Stochastic Convex Optimization: O(1/n)- and O(1/n)-type of Risk Bounds
Although there exist plentiful theories of empirical risk minimization (ERM) for supervised learning, current theoretical understandings of ERM for a related problem—stochastic convex optimization (SCO), are limited. In this work, we strengthen the realm of ERM for SCO by exploiting smoothness and strong convexity conditions to improve the risk bounds. First, we establish an Õ(d/n + √ F∗/n) ris...
متن کاملEmpirical Risk Minimization for Stochastic Convex Optimization: $O(1/n)$- and $O(1/n^2)$-type of Risk Bounds
Although there exist plentiful theories of empirical risk minimization (ERM) for supervised learning, current theoretical understandings of ERM for a related problem—stochastic convex optimization (SCO), are limited. In this work, we strengthen the realm of ERM for SCO by exploiting smoothness and strong convexity conditions to improve the risk bounds. First, we establish an Õ(d/n+ √ F∗/n) risk...
متن کاملAn Accelerated Proximal Coordinate Gradient Method
We develop an accelerated randomized proximal coordinate gradient (APCG) method, for solving a broad class of composite convex optimization problems. In particular, our method achieves faster linear convergence rates for minimizing strongly convex functions than existing randomized proximal coordinate gradient methods. We show how to apply the APCG method to solve the dual of the regularized em...
متن کاملLearning From An Optimization Viewpoint
Optimization has always played a central role in machine learning and advances in the field of optimization and mathematical programming have greatly influenced machine learning models. However the connection between optimization and learning is much deeper : one can phrase statistical and online learning problems directly as corresponding optimization problems. In this dissertation I take this...
متن کاملA generalized form of the Hermite-Hadamard-Fejer type inequalities involving fractional integral for co-ordinated convex functions
Recently, a general class of the Hermit--Hadamard-Fejer inequality on convex functions is studied in [H. Budak, March 2019, 74:29, textit{Results in Mathematics}]. In this paper, we establish a generalization of Hermit--Hadamard--Fejer inequality for fractional integral based on co-ordinated convex functions.Our results generalize and improve several inequalities obtained in earlier studies.
متن کامل